A Bayesian analysis of pentaquark signals from CLAS data 
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We examine the results of two measurements by the CLAS collaboration, one of which claimed 
evidence for a + pentaquark, whilst the other found no such evidence. The unique feature of these 
two experiments was that they were performed with the same experimental setup. Using a Bayesian 
analysis we find that the results of the two experiments are in fact compatible with each other, but 
that the first measurement did not contain sufficient information to determine unambiguously the 
existence of a + . Further, we suggest a means by which the existence of a new candidate particle 
can be tested in a rigorous manner. 

PACS numbers: 13.60.Rj; 12.39.Mk; 14.20.Jn; 14.80.-j; 02.50.-r; 02.70.Uu 
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The debate about the existence of the S = +1 
+ (154O) baryon state is still going at this point in time 
in spite of results from dedicated, high-luminosity mea- 
surements. One of these, [lj, from the CLAS collabora- 
tion at the Thomas Jefferson National Accelerator Facil- 
ity used the reaction "fd — > pK + K~n. It showed con- 
vincing evidence that production cross sections for such 
a state are nowhere near the levels implied by an earlier 
CLAS measurement [2] of the same channel, which had 
seen a peak in the pK~ missing mass spectrum at 1.542 
GeV/c 2 with a 5.2a statistical significance. The salient 
point is that the work of Ref. [l[ was a dedicated, high 
luminosity repeat of Ref. 0], where the experimental 
running conditions were as similar as practically possi- 
ble. 

In the whole history of 9 + pentaquark searches, there 
were several independent experiments that claimed to 
have found evidence, whilst a similar number claimed 
to have found nothing. It is impractical to examine the 
results of all such experiments in a consistent fashion, 
but the similarity of the two CLAS experiments provides 
us with an ideal opportunity to investigate apparently 
contradictory results. 

One can examine in detail whether any discrepancy 
arose from the data quality of the two experiments by 
making systematic tests on, for example, the effects of 
different cuts. In the original work for both measure- 
ments, however, parallel analyses were carried out to 
confirm the final spectra, and different internal reviews 
verified the correctness of the analysis procedures. We 
therefore assume that the quality of the data in both 
the experiments was consistent, and that the analyses of 
both experiments were carried out correctly. We concen- 
trate solely on the end-points of the analyses: namely, 



the events passing all cuts, which contribute to missing 
mass spectra. 

To get a feel for the problem, we took the data set from 
Ref. Q (hereafter referred to as "glO" after the CLAS 
running period in which the data was obtained) which 
had been analyzed in exactly the same way as the data 
from Ref. [2| (hereafter referred to as "g2a"). The glO 
data contained a factor of just under six more events, 
which could be directly compared. The glO data were 
then split into five independent subsamples, each con- 
taining the same number of counts as the g2a data set, 
and pK~ missing mass spectra were produced. These 
missing mass spectra would be where a + might be 
expected to appear. The glO subsample spectra are de- 
picted in figure[T^,-e, and the g2a spectrum is depicted in 
figure [If. 

Peak-like features appear in several of the glO sub- 
samples, but the shapes are by no means consistent. As 
mentioned previously and in keeping with current con- 
vention, the g2a result quoted a "significance" of about 
5a, which was similar to other experiments claiming ev- 
idence of discovery. However, 5a means that the prob- 
ability that a feature is a fluctuation is of the order of 
1CP 6 . This is a very small number; it does not appear to 
match the relative ease of generating peak- like features in 
the subsample spectra. How do we quantify the intuitive 
feeling that the odds of obtaining the observed g2a peak 
from fluctuations are not as small as 1 in 10 6 ? 

In this letter we attempt to address this problem within 
a Bayesian analysis framework, and to suggest an alter- 
native means of quantifying the evidence for discovery. 
What is specifically required is a quantitative compari- 
son between two hypotheses: "the spectrum contains a 
peak" , and "the spectrum does not contain a peak" . One 
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values is 



(a) g10 sample 1 




(b) g10 sample 2 







(e) g10 sample 5 
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FIG. 1: (Color online) pK~ missing mass spectra from the 
five gf subsamples and the original g2a data. The data are 
sorted into bins of width 10 MeV/c 2 . 



can model the shape of a spectrum as the addition of sim- 
ple functions, provided that they appear to describe the 
shape of the spectrum reasonably well, and have plausi- 
ble physical origins (e.g. Gaussians for resolution effects, 
etc.). We refer to these as "data models", to distinguish 
them from theoretical models. The posterior probability 
that a data model (M) is true given some observed data 
(D) is given by Bayes' theorem, 



P (M | D) = 



P (D | M) P (M) 
P~W) : 



(1) 



where P (D \ M) is the probability of the data being ob- 
served given the model, and P (M) represents the prior 
probability of the model being correct. P (D) is a nor- 
malizing constant, which will cancel out in the ratio that 
compares the posterior probabilities of two models. 

Now the data model will depend on some parameters 
£, and the posterior probability of these taking on specific 



P(£\D,M) = 



P(D\£,M)P(Z\M) 
P{D\M) 



(2) 



where P (D | £, M) is the probability of the data be- 
ing observed given the model and its parameters, and 
P (£ | M) is the prior probability of the parameters. Fit- 
ting parameters to data is a matter of maximizing this 
posterior. The quantity in the denominator of Eq. @ 
is known as the evidence for a model and is obtained by 
marginalizing (integrating) over the parameters: 



P (D | M) = J d^P (D | f, M) P (£ | M). 



(3) 



Since the evidence is an integral over the model param- 
eters, it implicitly implements Occam's razor. Evidence 
ratios provide a balance between favouring on the one 
hand the simpler model, and on the other hand the model 
that better fits the data. 

We construct two very simple data models of the miss- 
ing mass spectra obtained from experiment: 

• Model Mq: The spectrum can be described by a 
3 rd order polynomial in the region of interest. This 
represents the assumption that there is no new par- 
ticle. A 3 rd order polynomial was employed in the 
original analysis to model the background shape. 
This model depends on four parameters. 

• Model Mp: The spectrum can be described by a 
"narrow" Gaussian peak sitting atop a 3 rd order 
polynomial background in the region of interest. 
"Narrow" in this case meaning that the width is 
significantly less than the region of interest in the 
mass spectrum. This model depends on seven pa- 
rameters. 

To compare the different models, a ratio of their prob- 
abilities in the light of data can be formed: 



Re 



P (M P \D) _P{D\ M P ) P (M P ) 
P (M | D) ~ P (D | M ) X P (M ) : 



(4) 



where Bayes' theorem has been used to obtain the final 
expression. This is the ratio of evidences for the mod- 
els multiplied by the ratio of prior probabilities of the 
models. If there is no prior preference for either model, 
the final factor is unity, so the ratio of model probabil- 
ities becomes a ratio of evidences. Re is known as the 
"Bayes' Factor" or "evidence ratio". 

It is computationally convenient and equivalent to ex- 
amine the logarithms of the evidence ratios: 



Itl(Re) = In P (D | M P ) -1nP(D\ M ) 



(5) 



Determining what value of Iii(Re) to use in deciding be- 
tween data models is somewhat arbitrary, but Jeffreys 
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established [3j a rough evidence scale versus written de- 
scriptors: |ln(i? e )| < 1 is weak, 1 < |ln(i? e )| < 2.5 is 
substantial, 2.5 < | ln(i? e ) < 5 is strong and | ln(i? e )| > 5 
is decisive. So model comparison is quantified by Re, and 
as constructed means that data favouring a data model 
with a peak have positive ln(i? e ). 

To evaluate evidences, we see from Eq. ([3]) that an inte- 
gral over a likelihood P (D | £, M) and a prior P (£ | M) 
is required. We calculate the likelihood by evaluating 
for each bin in a spectrum an "ideal" number of counts, 
Si(£), for a given set of parameters. The probability of 
this being correct given the measured counts is calcu- 
lated using a Poisson distribution. The total likelihood 
is then a product of these probabilities for each bin: 

P(D\Z.M)=Y\ S ?' exp (~^) . ( 6 ) 

i Ul 

Here, the prior probability is constructed by assuming 
no initial correlations between parameters, so it is sim- 
ply a product of priors for each separate parameter. We 
assume that each prior is a uniform distribution between 
a lower and upper limit since this represents the least 
initial bias. The prior parameter ranges were established 
by performing an initial fit and setting the limits to be 
±50% of the values found. This resulted in a large flexi- 
bility in the shapes of both background and peak. 

To perform the integrations over the many parame- 
ters in the models, we utilized the technique of "nested 
sampling" developed by Skilling 0, [f| . Essentially, this 
is a Monte Carlo integration method developed specifi- 
cally for Bayesian data analysis. We refer the reader to 
the original reference for details, and to Ref. @ for an 
example application. 

We applied the model comparison framework to all the 
spectra shown in figure ([1]) . In addition we analyzed the 
spectra shown in figure which consisted of: (a) the 
full glO spectrum; (b) a "fake" spectrum, constructed by 
sampling from a combination of signal and background 
functions in the data model with the peak (Mp), which 
had the same signal-to-background ratio as the g2a spec- 
trum. This was done to show what the results of this 
analysis would have been, had a resonance been there; 
(c) and (d) pK~ missing mass spectra from the g2a and 
glO data sets, but showing the A(1520) signal, in order 
to test how the technique fared for the case of a well- 
established particle. 

The results are quoted in table (JTJ) , and displayed 
graphically in figure [3] We omit the results for the 
A(1520) from the figure, as they would render the scale 
unusable. To estimate the uncertainty in the Monte 
Carlo integrals, we ran at least 20 independent calcu- 
lations for each spectrum analysed. The errors listed in 
the table represent the standard error of the samples. 




M m [GeV/c°] M m [GeV/c ! ] 

FIG. 2: (Color online) Missing mass histograms for + from 
a) glO, b) fake, and A(1520) from c) glO, and d) g2a data. 
The data in a) and b) are sorted into bins of width 10 MeV/c 2 , 
and the bins in c) and d) have width 5 MeV/c 2 . 

With the splitting of the glO data set, we have shown 
(figure [T]) the relative ease with which one can obtain a 
peak-like feature, given a small number of events. The 
evidence ratios calculated for the individual subsamplcs 
in glO generally suggest a bias against a peak, which 
perhaps mirrors an intuitive feeling about how signifi- 
cant such features really are. However, two of the five 
subsamples (2 and 4) are compatible with the "weak" 
category, meaning that the results are essentially incon- 
clusive. Whilst the g2a result is more of an outlier, it also 
falls in the weak category and is inconclusive; the results 
of the two measurements are therefore compatible with 
each other. 



Data sample 


ln(R E ) 


glO sample 1 


-1.56 


± 


0.07 


glO sample 2 


-1.09 


± 


0.13 


glO sample 3 


-1.64 


± 


0.09 


glO sample 4 


-1.11 


± 


0.11 


glO sample 5 


-1.82 


± 


0.07 


glO full 


-2.87 


± 


0.11 


g2a 


-0.41 


± 


0.10 


fake 


5.78 


± 


0.27 


g2a A(1520) 


96.70 


± 


0.70 


glO A(1520) 


549.12 


± 


2.17 



TABLE I: Evidence ratios. Calculations are done by nested 
sampling, hence the need to include standard errors. 
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FIG. 3: Graphical representation of the values of evidence 
ratios from table U on a logarithmic scale. The horizontal 
lines correspond to the limits of the regions associated with 
the different descriptors of the Jeffreys scale. 

The \u(Re) value for g2a (-0.408) indicates weak evi- 
dence in favour of the data model without a peak in the 
spectrum. What this means is that whilst a data model 
including a peak gives a better fit by eye to the spectrum, 
it does not compensate for having had to introduce addi- 
tional parameters for the peak. This is Occam's razor in 
action; simpler models are preferable unless more com- 
plex models do much better. One must be careful what 
to conclude from the g2a spectrum, however, since the 
evidence ratio does not conclusively rule out a peak; it is 
simply inconclusive. 

We now turn to the question of whether the glO ex- 
periment could conclusively discriminate between the two 
possibilities. The log of the evidence ratio for the full glO 
spectrum is -2.9. This makes it strong evidence against a 
peak in the spectrum. Another way of looking at this is 
that with this evidence ratio, the odds against a peak in 
this spectrum are about 17 to 1. Whilst this cannot com- 
pletely rule out a discovery, another measurement of this 
channel is probably not necessary. By comparison, the 
odds in favour of a peak in the fake spectrum are about 
320 to 1, meaning that had a signal really been there in 
glO, the experimental result would have been decisive. 

The study of the A(1520) shows that when a resonance 
is there, this method picks it out rather readily, with both 
g2a and glO data sets yielding a decisive result. We take 
this as a positive test that our method works. 



In summary, we have applied a Bayesian model com- 
parison method to analyzing the missing mass spectra 
produced in pentaquark searches. This has been used to 
study the relationship between the results of two CLAS 
measurements, which were taken under almost identical 
conditions. We have shown that there is no conflict be- 
tween the results of the two experiments, and that the 
low number of counts in the first experiment resulted in 
an ambiguous signal. Furthermore we have shown that 
the glO result shows strong evidence against the discov- 
ery of a pentaquark in this channel. More generally, this 
method could be applied to any data set where a search 
for a new state has been carried out, and can provide a 
quantitative measure with which to judge whether or not 
a result represents a discovery. 
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